AITopics | legendre polynomial

2604.19165

Country:

Asia > China > Shaanxi Province > Xi'an (0.05)
Europe > France > Auvergne-Rhône-Alpes > Puy-de-Dôme > Clermont-Ferrand (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Software > Programming Languages (0.48)
Information Technology > Artificial Intelligence > Machine Learning (0.48)

--Data normalization is crucial in machine learning, usually performed by subtracting the mean and dividing by standard deviation, or by rescaling to a fixed range. In copula theory [1], popular in finance, there is used normalization to approximately quantiles by transforming x CDF (x) with estimated CDF/EDF (cumulative/empirical distribution function) to nearly uniform distribution in [0, 1], allowing for simpler representations which are less likely to overfit. It seems nearly unknown in machine learning, therefore, as proposed in [2], we would like to present some its advantages on example of recently popular Kolmogorov-Arnold Networks (KANs), improving predictions from Legendre-KAN [3] by just switching rescaling to CDF normalization. Additionally, in HCR interpretation, weights of such neurons are mixed moments providing local joint distribution models, allow to propagate also probability distributions, and change propagation direction. Data normalization is very useful for various types of analysis, for example, through batch normalization in neural networks [4].

artificial intelligence, machine learning, normalization, (17 more...)

2507.13393

Country: Europe > Poland (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)

Lienen, Marten, Saydemir, Abdullah, Günnemann, Stephan

UnHiPPO: Uncertainty-aware Initialization for State Space Models

arXiv.org Machine LearningJun-6-2025

State space models are emerging as a dominant model class for sequence problems with many relying on the HiPPO framework to initialize their dynamics. However, HiPPO fundamentally assumes data to be noise-free; an assumption often violated in practice. We extend the HiPPO theory with measurement noise and derive an uncertainty-aware initialization for state space model dynamics. In our analysis, we interpret HiPPO as a linear stochastic control problem where the data enters as a noise-free control signal. We then reformulate the problem so that the data become noisy outputs of a latent system and arrive at an alternative dynamics initialization that infers the posterior of this latent system from the data without increasing runtime. Our experiments show that our initialization improves the resistance of state-space models to noise both at training and inference time. Find our implementation at https://cs.cit.tum.de/daml/unhippo.

artificial intelligence, machine learning, uncertainty-aware initialization, (14 more...)

2506.05065

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > Canada (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Ferguson, R. F., Ireland, D. G., McKinnon, B.

Data-driven Approach for Interpolation of Sparse Data

arXiv.org Machine LearningMay-6-2025

Extracting information about hadron resonances requires fitting theoretical models to experimental data. However, this data often comes from different experiments of different physics quantities in varying kinematic regions; studying coupled channels with different kinematic coverages and binning can make direct comparison challenging. The consistency of these datasets directly impacts the quality of the fit, thus making it difficult to accurately constrain the theoretical models. Sparse datasets in key kinematic regions further complicates the quantification of uncertainties, often requiring arbitrary weighting that may introduce bias. A robust approach to solving these problems involves utilising Gaussian Processes (GPs), a Bayesian inference machine learning technique that provides probabilistic predictions for unknown datapoints. Unlike traditional machine learning methods, GPs do not require any training; instead, they operate on three fundamental assumptions: 1. Some kernel function can be defined to measure the covariance between known datapoints; 2. This same kernel function can be used to predict the covariance between unknown datapoints; 3. Some idea of the form of the posterior distribution is known (e.g.

artificial intelligence, datapoint, machine learning, (18 more...)

2505.01473

Country:

Europe > United Kingdom (0.14)
North America > United States > Virginia (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)

Neuman, A. Martina, Pineda, Andres Felipe Lerma, Bramburger, Jason J., Brugiapaglia, Simone

Reconstruction of frequency-localized functions from pointwise samples via least squares and deep learning

arXiv.org Artificial IntelligenceFeb-13-2025

Recovering frequency-localized functions from pointwise data is a fundamental task in signal processing. We examine this problem from an approximation-theoretic perspective, focusing on least squares and deep learning-based methods. First, we establish a novel recovery theorem for least squares approximations using the Slepian basis from uniform random samples in low dimensions, explicitly tracking the dependence of the bandwidth on the sampling complexity. Building on these results, we then present a recovery guarantee for approximating bandlimited functions via deep learning from pointwise data. This result, framed as a practical existence theorem, provides conditions on the network architecture, training procedure, and data acquisition sufficient for accurate approximation. To complement our theoretical findings, we perform numerical comparisons between least squares and deep learning for approximating one- and two-dimensional functions. We conclude with a discussion of the theoretical limitations and the practical gaps between theory and implementation.

artificial intelligence, deep learning, machine learning, (14 more...)

2502.09794

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New York (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceApr-29-2024

PoPE: Legendre Orthogonal Polynomials Based Position Encoding for Large Language Models

Aggarwal, Arpit

There are several improvements proposed over the baseline Absolute Positional Encoding (APE) method used in original transformer. In this study, we aim to investigate the implications of inadequately representing positional encoding in higher dimensions on crucial aspects of the attention mechanism, the model's capacity to learn relative positional information, and the convergence of models, all stemming from the choice of sinusoidal basis functions. Through a combination of theoretical insights and empirical analyses, we elucidate how these challenges extend beyond APEs and may adversely affect the performance of Relative Positional Encoding (RPE) methods, such as Rotatory Positional Encoding (RoPE). Subsequently, we introduce an innovative solution termed Orthogonal Polynomial Based Positional Encoding (PoPE) to address some of the limitations associated with existing methods. The PoPE method encodes positional information by leveraging Orthogonal Legendre polynomials. Legendre polynomials as basis functions offers several desirable properties for positional encoding, including improved correlation structure, non-periodicity, orthogonality, and distinct functional forms among polynomials of varying orders. Our experimental findings demonstrate that transformer models incorporating PoPE outperform baseline transformer models on the $Multi30k$ English-to-German translation task, thus establishing a new performance benchmark. Furthermore, PoPE-based transformers exhibit significantly accelerated convergence rates. Additionally, we will present novel theoretical perspectives on position encoding based on the superior performance of PoPE.

higher dimension, legendre polynomial, polynomial, (11 more...)

2405.04585

Country:

Africa (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > Dominican Republic (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.54)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.65)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.47)

Zanger, Benjamin, Cui, Tiangang, Schreiber, Martin, Zahm, Olivier

Sequential transport maps using SoS density estimation and $\alpha$-divergences

arXiv.org Machine LearningFeb-27-2024

Transport-based density estimation methods are receiving growing interest because of their ability to efficiently generate samples from the approximated density. We further invertigate the sequential transport maps framework proposed from arXiv:2106.04170 arXiv:2303.02554, which builds on a sequence of composed Knothe-Rosenblatt (KR) maps. Each of those maps are built by first estimating an intermediate density of moderate complexity, and then by computing the exact KR map from a reference density to the precomputed approximate density. In our work, we explore the use of Sum-of-Squares (SoS) densities and $\alpha$-divergences for approximating the intermediate densities. Combining SoS densities with $\alpha$-divergence interestingly yields convex optimization problems which can be efficiently solved using semidefinite programming. The main advantage of $\alpha$-divergences is to enable working with unnormalized densities, which provides benefits both numerically and theoretically. In particular, we provide two new convergence analyses of the sequential transport maps: one based on a triangle-like inequality and the second on information geometric properties of $\alpha$-divergences for unnormalizied densities. The choice of intermediate densities is also crucial for the efficiency of the method. While tempered (or annealed) densities are the state-of-the-art, we introduce diffusion-based intermediate densities which permits to approximate densities known from samples only. Such intermediate densities are well-established in machine learning for generative modeling. Finally we propose and try different low-dimensional maps (or lazy maps) for dealing with high-dimensional problems and numerically demonstrate our methods on several benchmarks, including Bayesian inference problems and unsupervised learning task.

density estimation, divergence, polynomial, (15 more...)

2402.17943

Country:

Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.05)
North America > United States > Indiana (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(6 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Maran, Davide, Metelli, Alberto Maria, Papini, Matteo, Restell, Marcello

No-Regret Reinforcement Learning in Smooth MDPs

arXiv.org Artificial IntelligenceFeb-6-2024

Obtaining no-regret guarantees for reinforcement learning (RL) in the case of problems with continuous state and/or action spaces is still one of the major open challenges in the field. Recently, a variety of solutions have been proposed, but besides very specific settings, the general problem remains unsolved. In this paper, we introduce a novel structural assumption on the Markov decision processes (MDPs), namely $\nu-$smoothness, that generalizes most of the settings proposed so far (e.g., linear MDPs and Lipschitz MDPs). To face this challenging scenario, we propose two algorithms for regret minimization in $\nu-$smooth MDPs. Both algorithms build upon the idea of constructing an MDP representation through an orthogonal feature map based on Legendre polynomials. The first algorithm, \textsc{Legendre-Eleanor}, archives the no-regret property under weaker assumptions but is computationally inefficient, whereas the second one, \textsc{Legendre-LSVI}, runs in polynomial time, although for a smaller class of problems. After analyzing their regret properties, we compare our results with state-of-the-art ones from RL theory, showing that our algorithms achieve the best guarantees.

algorithm, mdp, no-regret reinforcement learning, (15 more...)

2402.03792

Country:

Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Poland (0.04)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)